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We propose two estimators of a monotone spectral density, that 
are based on the periodogram. These are the isotonic regression of 
the periodogram and the isotonic regression of the log-periodogram. 
We derive pointwise limit distribution results for the proposed esti- 
mators for short memory linear processes and long memory Gaussian 
processes and also that the estimators are rate optimal. 
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^ c*j 1. Introduction. The motivation for doing spectral analysis of stationary time series 

comes from the need to analyze the frequency content in the signal. The frequency content 
can for instance be described by the spectral density, defined below, for the process. One could 
be interested in looking for a few dominant frequencies or frequency regions, which correspond 
to multimodality in the spectral density. Inference methods for multimodal spectral densities 
J> ■ have been treated in [5], using the taut string method. A simpler problem is that of fitting 

a unimodal spectral density, i.e. the situation when there is only one dominant frequency, 
| which can be known or unknown, corresponding to known or unknown mode, respectively, and 

C^) ' leading to the problem of fitting a unimodal spectral density to the data. In this paper we treat 

unimodal spectral density estimation for known mode. A spectral density that is decreasing 
on [0, 7r] is a model for the frequency content in the signal being ordered. A unimodal spectral 
density is a model for there being one major frequency component, with a decreasing amount 
of other frequency components seen as a function of the distance to the major frequency. 

Imposing monotonicity (or unimodality) means that one imposes a nonparametric ap- 
proach, since the set of monotone (or unimodal) spectral densities is infinite-dimensional. 
A parametric problem that is contained in our estimation problem is that of a power law 
spectrum, i.e. when one assumes that the spectral density decreases as a power function 
f{u) ~ for u € (0, 7r), with unknown exponent f3. Power law spectra seem to have impor- 
tant applications to physics, astronomy and medicine: four different application mentioned in 
[16] are a) fluctuations in the Earth's rate of rotation cf. [20], b) voltage fluctuations across 
cell membrane cf. [10], c) time series of impedances of rock layers in boreholes cf. e.g. [13] and 
d) x-ray time variability of galaxies cf. [17]. We propose to use a nonparametric approach as 
an alternative to the power law spectrum methods used in these applications. There are (at 
least) two reasons why this could make sense: Firstly, the reason for using a power function 
e.g. to model the spectrum in the background radiation is (at best) a theoretical considera- 
tion exploiting physical theory and leading to the power function as a good approximation. 
However, this is a stronger model assumption to impose on the data than merely imposing 
monotonicity and thus one could imagine a wider range of situations that should be possible 
to analyze using our methods. Secondly, fitting a power law spectral model to data consists 
of doing linear regression of the log periodogram; if the data are not very well aligned along a 
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straight line (after a log-transformation) this could influence the overall fit. A nonparametric 
approach, in which one assumes only monotonicity, is more robust against possible misfit. 

Sometimes one assumes a piecewise power law spectrum, cf. [21], as a model. Our methods 
are well adapted to these situations when the overall function behaviour is that of a decreasing 
function. 

Furthermore there seem to be instances in the litterature when a monotonically decreasing 
(or monotonically increasing) spectral density is both implicitly assumed as a model, and 
furthermore seems feasible: Two examples in [22] (cf. e.g. Figures 20 and 21 in [22]) are e) 
the wind speed in a certain direction at a certain location measured every 0.025 second (for 
which a decreasing spectral density seems to be feasible) and f ) the daily record of how well an 
atomic clock keeps time on a day to day basis (which seems to exhibit an increasing spectral 
density). The methods utilized in [22] are smoothing of the periodogram. We propose to use 
an order-restricted estimator of the spectral density, and would like to claim that this is better 
adapted to the situations at hand. 

Decreasing spectral densities can arise when one observes a sum of several parametric 
time series, for instance AR(1) processes with coefficient |a| < 1; the interest of the non 
parametric method in that case is that one does not have to know how many AR(1) are 
summed up. Another parametric example is an ARFIMA(0,d,0) with < d < 1/2, which has 
a decreasing spectral density, which is observed with added white noise, or even with added one 
(or several) AR(1) processes; the resulting time series will have a decreasing spectral density. 
Our methods are well adapted to this situation, and we will illustrate the nonparametric 
methods on simulated data from such parametric models. 

The spectral measure of a weakly stationary process is the positive measure a on [— ir, ir) 
characterized by the relation 

cov(X ,X k )= ^ e lkx a(dx) . 

J —IT 

The spectral density, when it exists, is the density of a with respect to Lebesgue's measure. It 
is an even nonnegative integrable function on [— ir, ir]. Define the spectral distribution function 
on [— 7r, ir] by 

F(X) = [ f(u)du , < A < vr, 
Jo 

F(X) = -F(-X) , -ir < X < . 
An estimate of the spectral density is given by the periodogram 

The spectral distribution function is estimated by the empirical spectral distribution function 

F n (X)= [ I n {u)du. 

Jo 

Functional central limit theorems for F n have been established in [4] and [18]. However, since 
the derivative is not a smooth map, the properties of F n do not transfer to I n , and furthermore 
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it is well known that the periodogram is not even a consistent estimate of the spectral density. 
The standard remedy for obtaining consistency is to use kernel smoothers. This however 
entails a bandwidth choice, which is somewhat ad hoc. The assumption of monotonicity allows 
for the construction of adaptive estimators that do not need a pre-specified bandwidth. 
We will restrict our attention to the class of non increasing functions. 

Definition 1. Let J- be the convex cone of integrable, monotone non increasing functions 
on (0, 7r]. 

Given a stationary sequence {X^} with spectral density /, the goal is to estimate / under 
the assumption that it lies in T . We suggest two estimators, which are the L 2 orthogonal 
projections on the convex cone T of the periodogram and of the log-periodogram, respectively. 

(i) The L 2 minimum distance estimate between the periodogram and J- is defined as 

(1) f n = argminQ(z) , 

with 

Q(z)= ^ (I n (s) - z(s)) 2 ds . 
Jo 

This estimator of the spectral density naturally yields a corresponding estimator F n of 
the spectral distribution function F, defined by 

(2) F n (t)= f f n (s)ds. 

Jo 

(ii) The L 2 minimum distance estimate between the log-periodogram (often called the cep- 
strum) and the "logarithm of J 7 " , is defined as 

(3) /„ = expargminQ(2r) , 

z€_F 

with 

Q{z)= {log/ n (s) + 7 -logz(s)} 2 ds , 
Jo 

where 7 is Euler's constant. To understand the occurence of the centering —7, recall 
that if {X n } is a Gaussian white noise sequence with variance <r 2 , then its spectral 
density is a 2 /{2n) and the distribution of I n (s) / '(a 2 /2ir) is a standard exponential (i.e. 
one half of a chi-square with two degrees of freedom), and it is well known that if Z is 
a standard exponential, then E[log(Z)] = —7 and var(logZ) = 7r 2 /6, see e.g. [12]. The 
log-spectral density is of particular interest in the context of long range dependent time 
series, i.e. when the spectral density has a singularity at some frequency and might not be 
square integrable, though it is always integrable by definition. For instance, the spectral 
density of an ARFIMA(0,d,0) process is f{x) = a 2 \l - e ix \~ 2d , with d G (-1/2, 1/2). It 
is decreasing on (0, n] for d G (0,1/2) and not square integrable for d G (1/4,1/2). In 
this context, for technical reasons, we will take I n to be a step function changing value 
at the so-called Fourier frequencies = 2irk/n. 
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The paper is organized as follows: In Section 2 we derive the algorithms for the estimators 
f n , F n and f n . In Section 3 we derive a lower bound for the asymptotic local minimax risk 
in monotone spectral density estimation, and show that the rate is not faster than n -1 / 3 . In 
Section 4 we derive the pointwise limit distributions for the proposed estimators. The limit 
distribution of f n (suitably centered and normalized) is derived for a linear process. The 
asymptotic distribution is that of the slope of the least concave majorant at of a quadratic 
function plus a two-sided Brownian motion. Up to constants, this distribution is the so-called 
Chernoff's distribution, see [8], which turns up in many situations in monotone function 
estimation, see e.g. [23] for monotone density estimation and [27] for monotone regression 
function estimation. The limit distribution for f n is derived for a Gaussian process, and is 
similar to the result for f n . Section 5 contains a simulation study with plots of the estimators. 
Section 6 contains the proofs of the limit distribution results (Theorems 5 and 6). 

2. Identification of the estimators. Let libea function defined on a compact interval 
[a, b]. The least concave majorant T{h) of h and its derivative T(h)' are defined by 

T(h) = argminjz : z > x, z concave} , 

m /, m, n . h(v) — h(u) 
T(h)'{t) = minmax— — . 

U<t v>t V — U 

By definition, T(h)(t) > h(t) for all t £ [a, 6] and it is also clear that T(h)(a) = h(a), 
T(h)(b) = h(b). Since T{h) is concave, it is everywhere left and right differentiable, T(h)' as 
defined above coincides with the left derivative of T(h) and T(h)(t) = f*T(h)'(s) ds (see for 
instance Hormander [11, Theorem 1.1.9]). We will also need the following result. 

Lemma 1. Ifhis continuous, then the support of the Stieltjes measure dT(h)' is included 
in the set {T(h) = h}. 

Proof. Since h and T(h) are continuous and T(h)(a) — h(a) = T(h)(b) — h(b) = 0, the 
set {T(h) > h} is open. Thus it is a union of open intervals. On such an interval, T(h) is 
linear since otherwise it would be possible to build a concave majorant of h that would be 
stricly smaller than T(h) on some smaller open subinterval. Hence T{h)' is piecewise constant 
on the open set \T{h) > h}, so that the support of dT(h)' is included in the closed set 
{T(h) = h}. □ 

The next Lemma characterizes the least concave majorant as the solution of a quadratic 
optimization problem. For any integrable function g, define the function g on [0, tt] by 

git) = I g{s)ds . 







Lemma 2. Let g £ L 2 ([0,7r]). Let G be defined on L 2 ([0,7r]) by 

G{f) = \\f-g\\l= r{f(s)-g(s)} 2 ds. 
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Then arg min/gj- G(/) = T(g)'. 
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This result seems to be well known. It is cited e.g. in [15, p. 726] but since we have not 
found a proof, we give one for completeness. 

Let G : T i— > M. be an arbitrary functional. It is called Gateaux differentiable at the point 
/ £ T if the limit 

g /W = llm G(/ + ' fc) - G ' /) 

}K ' t->0 t 

exists for every /i such that f + th £ J- for small enough £. 

PROOF of Lemma 2. Denote G7(/) = ||/ — g\\% and / = T(g)'. The Gateaux derivative of 
G at / in the direction h is 

G)(h) = 2 f\(t){f(t)-g(t)}dt. 

By integration by parts, and using that T(g)(ir) — g(ir) = T (g)(0) — g(0) = 0, for any function 
of bounded variation h, we have 



(4) G' f (h) = -2 J {T(g)(t)-9(t)}dh(t). 

By Lemma 1, the support of the measure df is included in the closed set {T(g) = g}, thus 



(5) G)(f) = -2 J {T(g)(t) ~ Sit)) d/(t) = . 
If h = f — f, with / monotone non increasing, (4) and (5) imply that 

(6) G' f (f - f) = -2f\T(g)(t) ~ g(t)}df(t) > . 

Let / € T be arbitrary and let u be the function defined on [0, 1] by u(t) = G(f + t(f — /)). 
Then u is convex and u'(0) = G'~(f — f) > by (6). Since u is convex, if u'(0) > 0, then 

u(l) > u(0), i.e. G(/) > G(f). This proves that / = argmin /eJ rG(/). □ 

Since / n and log / n are the minimizers of the L 2 distance of I n and log(/ n ) + 7, respectively, 
over the convex cone of monotone functions, we can apply Lemma 2 to derive characterizations 
of f n and f n . 

Theorem 3. Let f n , F n and f n be defined in (1), (2) and (3), respectively. Then 
f n = T(F n )', F n (t)=T(F n ) , f n = exp{T(F n )'}, 

where 

F n (t)= I I n (u)du, F n (t)= I (log/ n (n) + 7 )dn. 
io io 

Standard and well known algorithms for calculating the map y 1— > T(y)' are the pool 
adjacent violators algorithm (PAVA), the minimum lower set algorithm (MLSA) and the 
min-max formulas, cf. [24]. Since the maps T and T' are continuous operations, in fact the 
algorithms PAVA and MLSA will be approximations that solve the discrete versions of our 
problems, replacing the integrals in Q and Q with approximating Riemann sums. Note that 
the resulting estimators are order-restricted means; the discrete approximations entail that 
these are approximated as sums instead of integrals. The approximation errors are similar to 
the ones obtained e.g. for the methods in [15] and [1]. 
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3. Lower bound for the local asymptotic minimax risk. We establish a lower 
bound for the minimax risk when estimating a monotone spectral density at a fixed point. 
This result will be proved by looking at parametrized subfamilies of spectral densities in an 
open set of densities on K n ; the subfamilies can be seen as (parametrized) curves in the set 
of monotone spectral densities. The topology used will be the one generated by the metric 

p(f,g) = / I/O) - a{x)\dx 
Jr 

for /, g spectral density functions on [— 7r,7r]. Note first that the distribution of a stochastic 
process is not uniquely defined by the spectral density. To accomodate this, let C g be the set 
of all laws of stationary processes (i.e. the translation invariant probability distributions on 
R°°) with spectral density g. 

Let e > 0, ci,C2 be given finite constants and let to > 0, the point at which we want to 
estimate the spectral density, be given. 

Definition 2. For each n£Z let Q 1 := Q 1 (e, c±, C2, to) be a set of monotone C 1 spectral 
densities g on [0,7r], such that 

(7) sup g'(t) < , 

\t-t \<e 

(8) ci < inf g(t) < sup g(t) < C2 . 

|t — *o|<e \t-t \<e 

Theorem 4. For every open set U in Q 1 there is a positive constant c(U) such that 
liminfinfsup sup n 1/3 E L [(T n - g(t )) 2 } > c(U) , 

n->oo T n g& ULGC g 

where the infimum is taken over all functions T n of the data. 

Proof. Let A; be a fixed real valued continuously differentiable function, with support 
[— 1, 1] such that J k(t) dt = 0, k(0) = 1 and sup \k(t)\ < 1. Then, since k! is continuous with 
compact support, |A/| < C for some constant C < oo. 

For fixed h > 0, define a parametrized family of spectral densities go by 

g (t)=g(t)+9k(^\ . 
Obviously, {ge}eee are C 1 functions. Since 

and since k' is bounded, we have that, for \t — to\ < e, g' e (t) < if \9/h\ < 5, for some 
5 = 5(C) > 0. Thus, in order to make the parametrized spectral densities ge strictly decreasing 
in the neighbourhood {t : \t — to\ < e}, the parameter space for 6 should be chosen as 

9 = (Sh,Sh). 

We will use the van Trees inequality (cf. Gill and Levit [7, Theorem 1]) for the estimand 
9e(io) = g(to) + 6- Let A be an arbitrary prior density on O. Then, for sufficiently small 6, 
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{gg '■ G 0} C U (cf. the definition of the metric p). Let Pq denote the distribution of a 
Gaussian process with spectral density gg, and Kg the corresponding expectation. Then 

sup sup E L [(T n - g(t )) 2 ] > su P E e [(T„ - gg(t )) 2 ] 
geu Lec g flee 

> / E e [(T n -g e (t )) 2 }\(e)d9 . 
Je 

Then, by the Van Trees inequality, we obtain 

(9) I Ee[(T n -g e (to)) 2 ]X(e)de> 

Je 



where 



le ~ fl n (6)\(0)d6 + l{\) 



ln(0) = \ti ({M- 1 (g e )M n (dege)} 2 ) 



is the Fisher information matrix, cf. [6] , with respect to the parameter 6 of a Gaussian process 
with spectral density gg, and for any even nonnegative integrable function cf> on [— tt, it), M n (<j)) 
is the Toeplitz matrix of order n: 



M n (<t>)ij = I 4>{x) cos((i - j)x) dx . 

J — 7T 

For any n x n nonnegative symmetric matrix A, define the spectral radius of A as 

p(A) = sup{u*Au | u l u = 1} , 

where u l denotes transposition of the vector u, so that p(A) is the the largest eigenvalue of A. 
Then, for any ux n matrix B, it holds that tr(AB) < p(A)tr(B). If (f> is bounded away from 
zero, say 4>{x) > a > for all x G [— vr,7r], then p(M~ 1 ((j))) < a" 1 ; By the Parseval-Bessel 
inequality, 



tr({M n (^)} 2 ) < n / <f> 2 (x)dx . 

J — 7T 

Thus, if g is bounded below, then I n {9) is bounded by some constant times 

nf k 2 {(t-to)/h)dt = nh ( k 2 (t)dt . 



In order to get an expression for /(A), let Ao be an arbitrary density on (—1,1), and define 
the prior density on G = (—5h,5h) as X(6) = ^Ao(^). Then 

I(A) - u ~wr ^wLw^-w' 

Finally, plugging the previous bounds into (9) yields, for large enough n, 
sup sup E L [(T n (t ) - g(t )) 2 } > 



g&ULeC g nhc 3 + l d z h 1 

which, if h = n -1 / 3 , becomes 

sup sup E L [{T n (t ) - g(t )} 2 } > c 4 n~ 2 / 3 , 
geu Lec g 

for some positive constant C4. This completes the proof of Theorem 4. □ 
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4. Limit distribution results. We next derive the limit distributions for f n and f n 
under general assumptions. The main tools used are local limit distributions for the rescaled 
empirical spectral distribution function F n and empirical log-spectral distribution function F n 
respectively, as well as maximal bounds for the rescaled processes. These will be coupled with 
smoothness results for the least concave majorant map established in Anevski and Hossjer [1, 
Theorems 1 and 2]. The proofs are postponed to Section 6. 

4.1. The limit distribution for the estimator f n . 

Assumption 1. The process {Xi, i £ Z} is linear with respect to an i.i.d. sequence {e^, i G 
Z} with zero mean and unit variance, i.e. 

oo 

(10) X k = ^2aje k -j, 

3=0 

where the sequence {aj} satisfies 

oo 

(11) E^N+^JX^- 

Remark. Condition (11) is needed to deal with remainder terms and apply the results 
of [18] and [3]. It is implied for instance by the simpler condition 

oo 

(12) jy/'foKoo. 

i=i 

It is satisfied by most usual linear time series such as causal invertible ARMA processes. 



The spectral density of the process {Xi} is given by f(u) = (2ir) 1 Y^Lo a 3^ U 



2 

. Unfor- 



tunately, there is no explicit condition on the coefficients aj that implies monotonicity of /, 
but the coefficients aj are not of primary interest here. 

The limiting distribution of the estimator will be expressed in terms of the so-called Cher- 
noff's distribution, i.e. the law of a random variable £ defined by £ = aigmax se ^{W (s) — s 2 }, 
where W is a standard two sided Brownian motion. See [8] for details about this distribution. 

Theorem 5. Let {Xi} be a linear process such that (10) and (11) hold and IE [£q] <C oo. 
Assume that its spectral density f belongs to J 7 . Assume /'(to) < at the fixed point to. Then, 
as n — > oo, 

n 1/3 (/n(to) - f(to)) 4 2{-vr/ 2 (t )/'(t )} 1/3 C ■ 

4.2. The limit distributions for the estimator f n . In this section, in order to deal with the 
technicalities of the log-periodogram, we make the following assumption. 

Assumption 2. The process {X^} is Gaussian. Its spectral density f is monotone on 
(0, 7r] and can be expressed as f{x) = |1 — e lx \~ 2d f*(x), with \d\ < 1/2 and f* is bounded 
above and away from zero and there exists a constant C such that for all x,y £ (0, 7r], 

\f(x)-f(y)\<C W 



x Ay 
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Remark. This condition is usual in the long memory literature. Similar conditions are 
assumed in [25, Assumption 2], [19, Assumption 2], [26, Assumption 1] (with a typo). It is used 
to derive covariance bounds for the discrete Fourier transform ordinates of the process, which 
yield covariance bounds for non linear functionals of the periodogram ordinates in the Gaus- 
sian case. It is satisfied by usual long memory processes such as causal invertible ARFIMA 
(p,d,q) processes with possibly an additive independent white noise or AR(1) process. 

Recall that f n = exp argminj e jr Jj^jlog f(s) — log I n (s) + 7} 2 ds where 7 is Euler's constant 
and I n is the periodogram, defined here as a step function: 



2tt 

I n (t) = I n (2ir[nt/2Tr]/n) = — 

n 



x fc e i2fc7r t"V 27r ]/ n 



k=l 



Theorem 6. Let {X{\ be a Gaussian process that satisfies Assumption 2. Assume /'(to) < 
at the fixed point to £ (0, vr). Then, as n —> 00, 

,1/3 r w ~ f u\ - W ff* n U £ 9 ( ZlllM X 



n^{log f n ( to ) - log /(to)} * 2 [-^) C • 
Corollary 7. Under the assumptions of Theorem 6, 

n^{f n (to) - /(to)} 4 2{-^ 4 / 2 (to)/'(to)/3} 1 / 3 C • 

Remark. This is the same limiting distribution as in Theorem 5, up to the constant 
3 _1 / 3 7r > 1. Thus the estimator f n is less efficient than the estimator f n , but the interest of 
f n is to be used when long memory is suspected, i.e. the spectral density exhibits a singularity 
at zero, and the assumptions of Theorem 5 are not satisfied. 

5. Simulations and finite sample behaviour of estimators. In this section we apply 
the nonparametric methods on simulated time series data of sums of parametric models. The 
algorithms used for the calculation of f n and f n are the discrete versions of the estimators f,f n , 
that are obtained by doing isotonic regression of the data {(A&, I n (^k)) > k = 1, . . . , [(n — 1)/2]} 
where = 2nk/n. For instance the discrete version of f n is calculated as 

ft = argmin^(/„(A fc )-z(A fc )) 2 . 
k=i 

Note that the limit distribution for f n is stated for the discrete version /^. The simulations 
were done in R, using the "fracdiff" package. The code is available from the corresponding 
author upon request. 

Example 1. The first example consists of sums of several AR(1) processes. Let {X^} be 
a stationary AR(1) process, i.e. for all igZ, 

X k = alu + efc , 

with \a\ < 1. This process has spectral density function /(A) = (27r)~ 1 cr 2 | 1 — ae lA |~ 2 for 
— 7r < A < 7T, with a 2 = var(e 2 ) and and thus / is decreasing on [0,7r]. If X^\ . . . ,X^ are 
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independent AR(1) processes with coefficients a,j such that \a,j\ < l,j = l,...,p, and we 
define the process X by 

then X has spectral density /(A) = (2ir)~ 1 YTj=i a j\^ + aje* A | -2 which is decreasing on [0, it], 
since it is a sum of decreasing functions. Assuming that we do not know how many AR(1) pro- 
cesses are summed, we have a nonparametric problem: estimate a monotone spectral density. 
Figure 1 shows a plot of the periodogram, the true spectral density and the nonparametric 
estimators f n and f n for simulated data from a sum of three independent AR(1) processes 
with a\ = 0.5, ai = 0.7, 03 = 0.9. Figure 2 shows the pointwise means and 95% confidence 
intervals of f n and f n for 1000 realizations. 

Example 2. The second example is a sum of an ARFIMA(0,d,0) process and an AR(1) 
process. Let X& be an ARFIMA(0,d,0) -process with < d < 1/2. This has a spectral density 
(2-7r)~ 1 (T 2 |l — e lA | _2rf . If we add an independent AR(l)-process X^ with coefficient \a\ < 1 
the resulting process X = X^+XW will have spectral density /(A) = (2vr)- 1 o-f |1 -e iA |~ M + 
(27r)~ 1 o"||l — ae lA |~ 2 on [0,7r], and thus the resulting spectral density / will be a monotone 
function on [0,ir\. As above, if an unknown number of independent processes is added we 
have a nonparametric estimation problem. Figure 3 shows a plot of the periodogram, the 
true spectral density and the nonparametric estimators f n and f n for simulated time series 
data from a sum of an ARFIMA(0,d,0)-process with d = 0.2 and an AR(l)-process with 
a = 0.5. Figure 4 shows the pointwise means and 95% confidence intervals of /„ and f n for 
1000 realizations. 

Table 1 shows mean square root of sum of squares errors (comparing with the true function) , 
calculated on 1000 simulated samples of the times series of Example 1. Table 2 shows the 
analog values for Example 2. Both estimators f n and f n seem to have good finite sample 
properties. As indicated by the theory f n seems to be less efficient than f n . 



MISE 


n = 100 


n = 500 


n = 1000 


n = 5000 


In 


9.59 


12.96 


13.67 


14.25 


fn 


6.38 


5.48 


4.76 


2.95 


fn 


9.11 


8.52 


7.27 


4.26 



Table 1 
MISE values for Example 1 



MISE 


n = 100 


n = 500 


n = 1000 


n = 5000 


In 


1.80 


1.99 


2.02 


2.07 


fn 


0.710 


0.520 


0.432 


0.305 


fn 


1.12 


0.803 


0.659 


0.472 



Table 2 
MISE values for Example 2 
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0.1 0.2 0.3 0.4 

n= 500 




Fig 1. The spectral density (red), the periodogram (black), the estimates f n (green) and f n (yellow), for 
n=100, 500, 1000, and 5000 data points, for Example 1. 




Fig 2. Left plot: Spectral density (black), point-wise mean of estimates f n (red) and 95% confidence intervals 
(green). Right plot: Spectral density (black), pomtwise mean of the estimates f n (red) and 95% confidence 
intervals (green), for n=1000 data points, for Example 1. 
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Fig 3. The spectral density (red), the periodogram (black), the estimates f n (green) and f n (yellow), for 
n=100, 500, 1000, and 5000 data points, for Example 2. 



Fig 4. Left plot: Spectral density (black), point-wise mean of estimates f n (red) and 95% confidence intervals 
(green). Right plot: Spectral density (black), pomtwise mean of the estimates f n (red) and 95% confidence 
intervals (green), for n=1000 data points, for Example 2. 
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6. Proof of Theorems 5 and 6. Let J n be the integral of the generic preliminary 
estimator of the spectral density, that is the integral of I n or of log(I n ), let K denote F or 
the primitive of log/, respectively, and write 

(13) J n (t) = K(t) + v n (t) , 

Let d n ! be a deterministic sequence and define the rescaled process and rescaled centering 

v n (s;t ) = d~ 2 {v n (t + sd n ) - v n (to)} , 
g n {s) = d~ 2 {K{t + sd n ) ~ K(to) - K'{t )d n s} . 

Consider the following conditions. 

(AH1) There exists a stochastic process v(-;to) such that 



(14) 
(15) 



(16) 



in D{— oo,oo), endowed with the topology generated by the supnorm metric on com- 
pact intervals, as n — > oo. 
(AH2) For each e, 5 > there is a finite r such that 



limsupP I sup 

rwoo \|s|>t 



v n (s;t ) 



9n{s) 



> e 





v{s;t ) 


sup 


s 2 



< 5 

< 5 



(17) 
(18) 

(AH3) There is a constant A < such that for each c > 0, 

(19) lim sup \g n (s) - As 2 \ = ; 

"^■°°| s |<c 

(AH4) For each a£l and c, e > 

(20) P(v(s;t )(s) - v(0;t )+As 2 - as > e\s\ for all s G [-c,c]) = . 

If there exists a sequence d n such that these four conditions hold, then, defining the process 
y by y(s) = v(s; to) + As 2 , by Anevski and Hossjer [1, Theorems 1 and 2], as n — > oo, it holds 
that 

(21) d~ l {T( J n )'(t ) - K'(t )} A T(y)'(0) , 

where T(y)'(0) denotes the slope at zero of the smallest concave majorant of y. 

6.1. Proof of Theorem 5 . The proof consists in checking Conditions (AH1) - (AH4) with 
J n = F n and K = F. 

- It is proved in Lemma 8 below that (16) holds with d n = n 1//3 and v(-;to) the standard two 
sided Brownian motion times y ir 2 /Q. 

- If /'(t ) < 0, then (19) holds with A = \f'{t ) and 4 | an arbitrary deterministic 
sequence. 
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- Lemma 9 shows that (17) holds and the law of iterated logarithm yields that (18) holds for 
the two-sided Brownian motion. 

- Finally, (20) also holds for the two sided Brownian motion. 

Thus (21) holds with the process y defined by 

V(s) = \f'(t )s 2 + V^f(t )W(s) . 

The scaling property of the Brownian motion yields the representation of T(y)'(0) in terms 
of Chernoff 's distribution. □ 

Lemma 8. Assume the process {X n } is given by (10), that (11) holds and that E[eg] < oo. 
If d n = n" 1 ^ 3 , then the sequence of processes v n (-;to) defined in (14) converges weakly in 
C([—c,c\) to ^/2nf(tQ)W where W is a standard two sided Brownian motion. 

Proof. For clarity, we omit to i n the notation. Write 

v n (s) = v e n (s) + R n (s) 

with 

pto+dnS i 

(22) v^(s)=d~ 2 /(«){#>(«) -l}d«, #>(«) = - 

Jto n 



k=l 



rto+dnS 

(23) R n = d~ 2 / r n (u) du , r n (u) = I n (u) - f(u)I^(u) . 

Jto 



Note that (27r) -1 /^ is the periodogram for the white noise sequence {ei}. We first treat the 
remainder term R n . Denote Q = {g : f^ n g 2 (u)f 2 (u) du < oo}. Equation (5.11) (with a typo 
in the normalization) in [18] states that if (11) and E[ep] < oo hold, then 

g(x)r n (x) dx = op(l) . 

-IT 

Define the set Q = {k n (-,s)f : n £ N, s E [— c, c]}. Since / is bounded, we have that 
J k 2 (u, s)f 2 (u) du < oo, so Q C Q and we can apply (24) on Q, which shows that R n converges 
uniformly (over s G [— c, c]) to zero. We next show that 

(25) vtf(s)AV2^f(t )W(s), 

as n — > oo, on C(R), where W is a standard two sided Brownian motion. Since {e k } is a white 
noise sequence, we set to = without loss of generality. Straightforward algebra yields 

(26) v$(s) = d~ 2 {%(0) - l}F(d n s) + S n (s) 
with 

n n 

7n(0) = rT 1 ^ e i > S n (s) = ^2C k (s)e k , 

3=1 k=2 
k ~ 1 rd n s 

Ck(s) = d 3 J 2 ^ a j (s)e k . J , aj (s) = d~ 1 ' 2 / f(u) e^ du . 

j—l J —d n s 
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Since {e,} is a white noise sequence with finite fourth moment, it is easily checked that 
(27) nvar(7„(0)) = var(eo) , 

sup d~ 2 / f(u) dn|7n(0) - 1| = Op{d~ l n- l l 2 ) = P {^fd~ n ) . 
se[-c,c] Jo 

so that the first term in (26) is negligible. It remains to prove that the sequence of processes 
S n converges weakly to a standard Brownian motion. We prove the convergence of finite 
dimension distribution by application of the Martingale central limit Theorem, cf. for instance 
Hall and Heyde [9, Corollary 3.1]. It is sufficient to check the following conditions 



(28) lim VE[C|( S )] = 2^/ 2 (0) S , 

n— >oo 

k=2 

n 

(29) lim 5>[C fe 4 (s)] = . 

k=2 

By the Parseval-Bessel identity, we have 

00 rd n s 

]T a 2 (s) = 2nd- 1 / f 2 (u) du ~ 4vr/ 2 (0) S . 

3=—oo 

Since ao(s) ~ 2/(0)y / ^n, this implies that 

n n—1 oo 

k=2 j=l j=l 

This proves Condition (28). For the asymptotic negligibility condition (29), we use Rosenthal's 
inequality (cf. Hall and Heyde [9, Theorem 2.12]), 



fe-l /fe-i 



E[Ci\ < est n~ 2 aj(s) + est n~ 2 a 2 (s) ] = 0{n 
j"=i \i=i 



■2 i 



implying X^fc=i ^[Cfc( s )] = 0(n x ), which proves (29). To prove tightness, we compute the 
fourth moment of the increments of S n . Write 

n k—1 

S n (s) - S n (s') = n~ 1/2 ^2^2aj(s, s')e k -je k , 

k=l j=l 

with 

rd n s r~d n s' 

aj (s, s') = d- 1 / 2 / f(u) e lju du + d~ 1/2 / f(u) e lju du . 



By Parseval inequality, it holds that 



^a 2 (s,s') < C\s-s'\ 

3=1 
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Applying again Rosenthal inequality, we obtain that E[|5 ri (s) — 5 n (V)| 4 ] is bounded by a 
constant times 

n / n \ 

i=i V^ 1 / 

Applying [2, Theorem 15.6] concludes the proof of tightness. □ 

Lemma 9. For any 5 > and any k > 0, there exists r such that 

\v (s)\ 

(30) limsupP( sup 1 nK n > k) < 5 . 

n— >oo \s\>t 1*1 

Proof. Without loss of generality, we can assume that f(to) = 1. Recall that v n = 5„ + 
R n and 5^ (s) = F(d n s)( n + S n (s), where 5« and R n are defined in (22) and (23), Q n = 
<i~ 2 (7 n (0) — 1) and S n is defined in (26). Then 

15 (s)\ 

P(sup 1 " V ;| > re) < P(sup|Cn|i ? (dns)/s > k/3) 

+F ( sup ^M > k/3 ) +P ( sup L?ilM > k/3 ) . 

The spectral density is bounded, so F{d n s)/s < Cd n for all s. Since var(£ n ) = 0(d~ 1 ), by (27) 
and Bienayme-Chebyshev inequality, we get 

sup\Cn\F(d n s)/s > re ) < Oid^dl) = 0{d n ) . 

S>T J 

Let > 0} be an increasing sequence. Then we have the bound 

p /" sup > K \ < ^P(|5 n ( Sj )| > KSj) 

\s>so s / J= Q 

+ J]P sup |5 n (a)-5 n (a j_l)| > . 

j=l \Si-l<s<«j / 

From (28), we know that var(S' n (s)) = O(s). Thus 

oo 

J2H\Sn(Sj)\ > KSj) < CSt K^^Sj 1 ■ 

j=0 i=o 

Thus if the series sj 1 is summable, this sum can be made arbitrarily small by chosing so large 
enough. It was shown in the proof of Lemma 8 that 

E[\S n (s)-S n (s')\ i ]<C\s-s'\ 2 . 
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By Billingsley [2, Theorem 15.6] (or more specifically Ledoux and Talagrand [14, Theo- 
rem 11.1]), this implies that 

P ( sup \S n (s) - S n ( Sj ^)\ > KSj-i \ < C ( s i-*i-tf . 

\Sj-l<S<Sj / K S j-1 

Thus choosing Sj = (sq + j) p for some p > 1 implies that the series is convergent and 

P ( SU p^M >K ) =0 ( s -i) ; 

s>sq 8 

which is arbitrarily small for large so- 

To deal with the remainder term R n , we prove that P(sup s>SQ \R n {s)\/s > so) = op(l) by 
the same method as that used for S n . Thus we only need to obtain a suitable bound for the 
increments of R n . By definition of R n , we have, for s < s', 

to+d n s' 



Rn(s') - Rn(s) = d~ 2 / f{u)r n (u) du . 

Jtct+dnS 



>to+d n s 

Since / is bounded, by Holder's inequality, we get 

rto+d„s' 

E[\R n (s') - R n (s)\ 2 } < ||/|| 00 n(a / - s) / E[r^(«)] dn . 

J to+d„s 

Under (11), it is known (see e.g. Brockwell and Davis [3, Theorem 10.3.1]) that 

E[r*(u)] < Cn~ l . 

Hence, 

E[\R n (s') - R n (s)\ 2 } < Cd n n(s' - s) 2 . 
The rest of the proof is similar to the proof for S n . This concludes the proof of (30). □ 

6.2. Sketch of Proof of Theorem 6. The proof consists in checking Conditions (AH1)- 
(AH4) with J n and K n now defined by J n (t) = / *{log I n (s)+'y} ds and K(t)= J* * log /(27r[ns/27r]/n) ds. 
Let Afc = 2fc7r/n denote the so-called Fourier frequencies. For t G [0, n], denote k n (t) = [nt/2n]. 
Denote 

£ fc = log{/ n ,(A fe )//(A fc )} + 7 
where 7 is Euler's constant. Then 

2vr kn{t) 

V n {t) = Jn{t) - K(t) = Yl + (t ~ A M i))£ Mi ) ■ 

i=i 

The log-periodogram ordinates £j are not independent, but sums of log-periodogram ordinates, 
such as the one above, behave asymptotically as sums of independent random variables with 
zero mean and variance 7r 2 /6 (cf. [26]), and bounded moments of all order. Thus, for to £ (0, n), 
the process £> n (s;io) = d~ 2 {v n (to + d n s) — v n (to)} with d n = n -1 / 3 converges weakly in 
D(—oo, 00) to the two-sided Brownian motion with variance 27r 4 /3. It can be shown by using 
the moment bounds of [26] that (17) holds. Finally, if / is differentiable at to, it is easily seen 
that d~ 2 (K(t + d n s) - K(t ) - d n sJ' b {to)} converges to \As 2 with A = f'(t )/f(t ). □ 
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